NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Deontically Constrained Policy Improvement in Reinforcement Learning Agents

Makarova, Alena; Abbas, Houssam (July 2025, College Publications)
vanBerkel, Kees; Ciabattoni, Agata; Horty, John (Ed.)
Markov Decision Processes (MDPs) are the most common model for decision making under uncertainty in the Machine Learning community. An MDP captures nondeterminism, probabilistic uncertainty, and an explicit model of action. A Reinforcement Learning (RL) agent learns to act in an MDP by maximizing a utility function. This paper considers the problem of learning a decision policy that maximizes utility subject to satisfying a constraint expressed in deontic logic. In this setup, the utility captures the agent’s mission - such as going quickly from A to B. The deontic formula represents (ethical, social, situational) constraints on how the agent might achieve its mission by prohibiting classes of behaviors. We use the logic of Expected Act Utilitarianism, a probabilistic stit logic that can be interpreted over controlled MDPs. We develop a variation on policy improvement, and show that it reaches a constrained local maximum of the mission utility. Given that in stit logic, an agent’s duty is derived from value maximization, this can be seen as a way of acting to simultaneously maximize two value functions, one of which is implicit, in a bi-level structure. We illustrate these results with experiments on sample MDPs.
more » « less
Free, publicly-accessible full text available July 1, 2026
Deontically Constrained Policy Improvement in Reinforcement Learning Agents

Makarova, Alena; Abbas, Houssam (July 2025, College Publications)

Free, publicly-accessible full text available July 1, 2026
Falsification and Control of CPS using the Language Set of Discrete-Time Temporal Logic

https://doi.org/10.1145/3716550.3722040

Abou-Mrad, Christian; Khandait, Tanmay; Pedrielli, Giulia; Abbas, Houssam (May 2025, ACM)

We present algorithms for Cyber-Physical Systems (CPS) falsification and control, which take advantage of knowing the entire language of the temporal logic specification - that is, the set of signals that satisfy the formula. In the design of CPS, falsification and control play key roles. Falsification is a testing task, where the goal is to find an input signal that causes the system's output trajectory to violate the correctness requirements. Control is the dual task, where the goal is to find an input signal that causes the system's output to satisfy the specification. When the specification is expressed in a temporal logic, most existing work relies on local optimization heuristics to perform both tasks. In this paper, we explore whether a different expression of the specification offers advantages when performing falsification and control. Recent work presented a method for computing a representation of the language of a formula in (discrete-time) Signal Temporal Logic (STL), showing that the language can be represented as a union of polytopes. We introduce new falsification algorithms which combine distance information to the different components of the language to accelerate the convergence to a falsifier. And we introduce a new algorithm for computing a satisfying control signal which works by repeatedly projecting violating output trajectories back onto the language's components. Moreover, these algorithms are trivially parallelizable to take advantage of multiple processors. Despite their relative simplicity, our algorithms demonstrate 10x to 100x speedups relative to the state-of-the-art.
more » « less
Free, publicly-accessible full text available May 6, 2026
Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates

https://doi.org/10.1609/aies.v7i1.31730

Shea-Blymyer, Colin; Abbas, Houssam (October 2024, Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society)

When designing agents for operation in uncertain environments, designers need tools to automatically reason about what agents ought to do, how that conflicts with what is actually happening, and how a policy might be modified to remove the conflict.These obligations include ethical and social obligations, permissions and prohibitions, which constrain how the agent achieves its mission and executes its policy.We propose a new deontic logic, Expected Act Utilitarian deontic logic, for enabling this reasoning at design time: for specifying and verifying the agent's strategic obligations, then modifying its policy from a reference policy to meet those obligations.Unlike approaches that work at the reward level, working at the logical level increases the transparency of the trade-offs.We introduce two algorithms: one for model-checking whether an RL agent has the right strategic obligations, and one for modifying a reference decision policy to make it meet obligations expressed in our logic.We illustrate our algorithms on DAC-MDPs which accurately abstract neural decision policies, and on toy gridworld environments.
more » « less
Full Text Available
Monitoring Signal Temporal Logic in Distributed Cyber-physical Systems

https://doi.org/10.1145/3576841.3585937

Momtaz, Anik; Abbas, Houssam; Bonakdarpour, Borzoo (May 2023, International Conference on Cyber-physical Systems)

Full Text Available
Differentiable Inference of Temporal Logic Formulas

https://doi.org/10.1109/TCAD.2022.3197506

Fronda, Nicole; Abbas, Houssam (November 2022, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
Generating Deontic Obligations From Utility-Maximizing Systems

https://doi.org/10.1145/3514094.3534163

Shea-Blymyer, Colin; Abbas, Houssam (July 2022, ACM Conference on Artificial Intelligence, Ethics and Society)

Full Text Available
Distributed Trajectory Planning for Multi-rotor UAVs with Signal Temporal Logic Objectives

https://doi.org/10.1109/CCTA49430.2022.9966096

Pant, Yash Vardhan; Abbas, Houssam; Mangharam, Rahul (August 2022, IEEE Conference on Control Technology and Applications (CCTA))

Full Text Available
Learning-‘N-Flying: A Learning-Based, Decentralized Mission-Aware UAS Collision Avoidance Scheme

https://doi.org/10.1145/3447624

Rodionova, Alëna; Pant, Yash Vardhan; Kurtz, Connor; Jang, Kuk; Abbas, Houssam; Mangharam, Rahul (October 2021, ACM Transactions on Cyber-Physical Systems)

Urban Air Mobility, the scenario where hundreds of manned and Unmanned Aircraft Systems (UASs) carry out a wide variety of missions (e.g., moving humans and goods within the city), is gaining acceptance as a transportation solution of the future. One of the key requirements for this to happen is safely managing the air traffic in these urban airspaces. Due to the expected density of the airspace, this requires fast autonomous solutions that can be deployed online. We propose Learning-‘N-Flying (LNF), a multi-UAS Collision Avoidance (CA) framework. It is decentralized, works on the fly, and allows autonomous Unmanned Aircraft System (UAS)s managed by different operators to safely carry out complex missions, represented using Signal Temporal Logic, in a shared airspace. We initially formulate the problem of predictive collision avoidance for two UASs as a mixed-integer linear program, and show that it is intractable to solve online. Instead, we first develop Learning-to-Fly (L2F) by combining (1) learning-based decision-making and (2) decentralized convex optimization-based control. LNF extends L2F to cases where there are more than two UASs on a collision path. Through extensive simulations, we show that our method can run online (computation time in the order of milliseconds) and under certain assumptions has failure rates of less than 1% in the worst case, improving to near 0% in more relaxed operations. We show the applicability of our scheme to a wide variety of settings through multiple case studies.
more » « less
Full Text Available
FADS: A framework for autonomous drone safety using temporal logic-based trajectory planning

https://doi.org/10.1016/j.trc.2021.103275

Pant, Yash Vardhan; Li, Max Z.; Rodionova, Alena; Quaye, Rhudii A.; Abbas, Houssam; Ryerson, Megan S.; Mangharam, Rahul (September 2021, Transportation Research Part C: Emerging Technologies)

Full Text Available

« Prev Next »

Search for: All records